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Abstract 



We explicitly show the connection between the protein folding problem and 
spin glass transition. This is then used to identify appropriate quantities that 
are required to describe the transition. A possible way of observing the spin 
glass transition is proposed. 
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If a protein has to explore all the possible configurations to reach its biologically active 
form, then the time required would be ~ 10 10 years compared to the real situation, which 
is of the order of few milliseconds to few seconds 0. This is the Levinthal paradox, whose 
resolution seems to hinge on the similarities between this biologically important problem and 
the concept of a rugged free energy landscape of a spin glass in condensed matter physics 
Attempts have also been made to study the dynamics directly but so far these are 
necessarily restricted to small chains. Our purpose in this paper is to establish the connection 
with the spin glass quantitatively, and thereby identify the appropriate quantities that one 
should look at in experiments. 

The first idea that there is a connection with spin glass came from the attempts to use a 
hierarchical tree structure of time scales @], and the random energy model f| to rationalize 
the observed dynamics of proteins [[]] . The connection was made apparent by the seminal 
work of Garel and Orland 0], and, independently, of Gutin and Shaknovich ||. From these 
evolved the idea of statistical proteins. The key points in this approach are that proteins 
are not simple homopolymers and functionally similar proteins of different species need not 
have identical backbone structure. The variation in an ensemble of such similar proteins 
can be thought of as random (albeit correlated) sequence of monomers along the backbone. 
This randomness leads to random interactions among the monomers. Since the monomers 
do not change positions once fixed, one has to consider averaging of physical properties over 
the random realizations of monomer configurations (quenched averaging). There will be 
quantities whose average over the ensemble will be same as that of typical samples, while 
there will be ones for which this is not true. The former represent the class of quantities 
that are called "self-averaging". This class would represent the generic properties of the 
proteins while the second class of non-self-averaging quantities would be specific to samples 
(species). The second class is expected to play a significant role in mutants. 

Granted the idea of statistical proteins, continuum path integral formulations were used 
in Ref [J7L^J to calculate physical properties by using the replica theory. In approximate 
calculations, similarities with the infinite state Potts glass were noted. In particular, Ref |7j 
shows the importance of a finite bond length for a sensible theory. It was soon realized that 
the monomers in a real protein are not just distributed at random, but there is a correlation, 
at least to some extent, and the ensemble should be suitably restricted. 

We take a model of finite bond lengths and exploit the correlation to choose a certain com- 
bination of variables as the independent random entities. Unlike the previous approaches, we 
use the bonds as the natural variables instead of the absolute coordinates of the monomers. 
We first establish that for correlated distribution of monomers, the problem can be mapped 
to a spin glass problem with long range interaction whose nature is determined by the cor- 
relation. We then identify the parameters that would describe the spin glass state of the 
protein, and this parameter is different from the measures of size of homopolymers. For the 
particular type of correlations considered, we obtain the exact scaling behaviour with the 
length. We then discuss how the parameter can be measured and the existence of a spin 
glass phase can be verified. The relevance to dynamics is also discussed. 

Let us start with the Kuhn model for a polymer consisting of bonds of unit length freely 
joined at ends so that each can have complete free rotation without any hindrance ||. See 
Fig 1. We consider the problem in d dimensions so that the orientation of each bond is given 
by a d-dimensional vector s p , with p going from 1 to N, the total number of bonds. Since 
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the polymer configurations can be completely specified by the N direction vectors, we can 
as well consider the equivalent problem of d component spins arranged in one dimension, 
representing the one dimensionality of the chain. It is in this picture we formulate the 
problem. 

Any two monomers p and q interact on contact with a coupling proportional to e pq , where 
€ pq is a quenched random variable. For a contact potential, this interaction is e pq S(r pq ), where 
r pq is the distance vector between monomers (sites) p and q, where S(x) = for x 7^ and 
= 1 for x = 0. In terms of the bond vectors, r pq = J2jZ p s i, the hamiltonian can be written 

as 

H = E e pq 5C£ Si /(q-p)). (1) 
p,q p 

This is the random version of the Domb- Joyce model for self avoiding walk, and a positive e pq 
would represent a repulsive (self avoiding term) [Jl0|| . We replace this contact delta potential 
<5(R) by a smoother potential 1 — R 2 . Note that for the Kuhn model, <| R |< 1, so that 
the replacement is equivalent to changing the discrete level and 1 to a band between 
and 1. This leads to much simplification afterwards. In a sense, two monomers interact 
with a truncated quadratic potential that can be repulsive or attractive. It is truncated just 
because the distance between the two cannot exceed a limit - a feature of the Kuhn model. 
The proposed Hamiltonian is therefore 

v KJ ' p.gG[ij) 

ignoring a constant (disorder dependent) term that does not contribute to the thermody- 
namics. The Hamiltonian in this form involves two sums. Given a pair i < j, that defines a 
cluster in the one dimensional chain, the inner sum involves a summation over all the pairs 
in the cluster. Let us rearrange the terms and do the outer sum first. For a pair p, q, sum 
over all the clusters to which it belongs, to obtain terms of the type J2i< P -j> q ~ i + 1)~ 2 - 
The correlation of the monomers along the backbone (i.e. of e pq ) is now invoked to write 
this sum over i and j terms of independent random elements. Specifically we choose, 

J2 £p-n, q+m [(q - p) + (m - n)] ~ 2 = , (3) 
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with J pq as the independent random variable and the exponent u as a measure of the 
correlation. It is not necessarily true that all correlations can be expressed in terms of such 
a simple form, but this is the simplest situation. More complex situations can be handled by 
considering correlated, and, if necessary, inhomogeneously distributed, J pq . This does not 
invalidate the basic concepts introduced here. The ensemble we will be considering involves 
polymers that have a particular type of correlations, as given by the distribution of the 
couplings and the value of a. 

The Hamiltonian now takes a form familiar in the spin glass context, namely 

H ^i^W s — (4) 
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where each J pq is an independent normal variable with a distribution P(J pq ) = (2ttJ) x I 2 
exp(—J 2 q /2J). This as a spin glass model is a generalization of the long range Ising model 



considered in Ref. |LT| to vector spins [12 



A spin glass transition is described in the h, T plane where h is the magnetic field that 
orients the spins in a particular direction. The thermodynamic transition is heralded by a 
diverging spin glass susceptibility, Xsg, while the uniform, linear susceptibility, x, remains 
finite at the transition |13|. One sees a cusp in \ at T c . In terms of the correlation functions, 
the two susceptibilities can be written as 



X = iV- 1 ^(s i -s i ), and X sg = iV" 1 £ (s, • s,) 2 . (5) 
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We shall restrict ourselves to the high temperature disordered phase, so that no special 
direction need be chosen. ( In general, one should discuss the longitudinal and transverse 
correlations [I2|.) The important point to keep in mind is the extensivity of the two suscep- 



tibilities, i.e., the total susceptibilities (both linear and spin glass) are proportional to the 
number of spins, so that the densities defined above are independent of N. In addition to the 
divergence of xsg ~\T — T c |~ 7 , there is also a diverging correlation length £ ~| T — T c \~ u 
which describes the behavior of the correlation function = (sj • s,,) 2 . The decay of the 
correlation at T c is described by the exponent 77, g^ — i \~ 1+r] . The response of the spin 
glass to an external field can be written asm = xh + Xnih 3 , where m = iV -1 J2i ( s i) is the net 
magnetization in the field. For symmetric distributions, it is known that (a) the nonlinear 
susceptibility Xni is related to the spin glass susceptibility Xsg, a relation that is often used 
to infer xsg from experiments [III], and (b) only the diagonal correlations contribute to %■ 



We now translate these spin glass quantities to polymers. The spins in our problem 
correspond to the bonds of the polymer, so that the total magnetization M = J2% s i corre- 
sponds to the end-to-end distance of the polymer. This is the quantity of interest in pure 
problems |§. Unless the polymer is in a stretched state, the configurational average of M is 
expected to be zero, and the size R of the polymer is given by the mean square end-to-end 
distance. The susceptibility is given by the variance of M, and so, with zero net magneti- 
zation, x — {M 2 )/N. The linear susceptibility of the spin system is therefore related to the 
size of the polymer. Since, as a density, x is independent of N, we find R ~ iV 1 / 2 , a result 
wellknown from random walk. Remember that we are ignoring self avoidance - that's why 
the random walk exponent. In the spin glass case, x remains finite for all T, and, therefore, 
the size as measured by R in our model will always be proportional to N 1 ^ 2 , except that 
the temperature dependence in the strict thermodynamic limit will show a singularity. In 
contrast, close to the transition temperature xsg shows a different behavior. A finite size 
scaling analysis [|U| gives, xsg ~ N" 1 ^, while away from T c it remains 0(1). 



We, therefore, propose that for the folding problem the appropriate quantity to look at 

is 



* = s;> 2 , (6) 

ij 

which goes like ~ jV 1+7 ^ in the critical region, but like N for T » T c . This is a different 
measure of size than conventionally used in pure problems. Its importance can be understood 
in terms of dynamics to be discussed below. It is possible to connect this $ to the size of 
the polymer in the following way: 
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(7) 



and if we assume the dominance of the diagonal terms then, 



$ ~ (R 2 ) 2 . (8) 

This is a justifiable assumption, since we do not require any new exponent to describe the spin 
glass. Similar scaling is expected for the radius of gyration also. This gives an experimentally 
accessible quantity that can be probed in scattering experiments (see below). 

Let us now go back to our Eq. f|. The spin glass problem can be studied in the replica 



framework following the method of Ref. [11]]. Details are skipped. The relevant results 



we need here are the following: (1) There is a spin glass transition for 1/2 < a < 1. (2) 
For a < 2/3, the behavior is meanfield like, and can as well be described by an infinitely 
weak infinite range model fTJJ. (3) Fluctuations play a major role for o > 2/3 and the one 



dimensional problem is expected to behave like a short ranged spin glass. 

As already pointed out after Eq. || the behaviour we want to see comes from 7/1/ which, 
by a scaling relation, is equal to 2 — rj ||13||. Our aim is therefore to calculate r\. Now, long 



range interactions do not require any renormalization [11]. As a result, the exponent 77 is 
known exactly to be 77 = 3 — 2a. Hence, the behavior of the fold parameter in the simple 
model is determined as 

$ ~ N 2a for T^T C 

~N for T»T C , (9) 

for 2/3 < a < 1. The restriction on o is needed because finite size scaling is not valid 
for mean field theories [T(J. In other words, for o < 2/3, no simple scaling form for $ is 



expected near the transition. 

A direct way of measuring the fold parameter $ is to device an experiment that stretches 
the polymer. In the spin glass language, the external field tries to orient all the spins along 
its direction. This ordered state corresponds to a stretched rod-like configuration of the 
polymer. The analog of the magnetic field in spin glass is therefore a stretching force as can 
be obtained by pulling the polymer at two ends (say by putting tunable charges at the ends), 
or in extensional flows that lead to a coil-stretch transition. It is therefore suggested that to 
elucidate the spin glass type behavior, it is necessary to study the response of proteins in the 
glassy state to a (may be oscillatory) stretching force, and look for the nonlinear response. 

Another way of measuring $ would be to look at the structure factor, especially in the 
leading correction (in momenta) to the small angle scattering. Let us for simplicity assume 
that optically the protein behaves as a homopolymer, i.e., in scattering, all the monomers 
behave identically, and the thermal averaging can be approximated by a gaussian average. 
The structure factor (see, e.g., Ref 0) for a given realization of the polymer is then given 
by exp(— k 2 R 2 ) for a wavevector k, where R 2 = N~ 2 J2m>n(( r m — r n) 2 ) is the square radius 
of gyration. A disorder averaged structure factor then gives R 2 ~ \ as the leading term in 
small angle scattering (i.e., k — > 0.) The next correction depends on (R 2 ) 2 which we argued 
to have the same scaling behavior as $. 

So far we focussed on the equilibrium aspect of the problem. The important time depen- 
dent activities (i.e. biological functions) involve rearrangements (release of strains) through 
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a sequence of functionally important motions (FIMs). FIMs are the movements of certain 
segments of the molecules involving or surrounding the active site. In our bond picture, the 
motion of a block from % to j can be executed by an interchange of the two spins Sj and 
Sj. For example, nearest neighbor i,j interchange corresponds to the Verdier-Stockmeyer 
type moves || while the next nearest neighbor interchange corresponds to a crankshaft mo- 
tion |P7fl . The FIMs can then be identified as two spin interchanges (blocks containing the 
active site), and one needs to classify them according to time scales. The relevant quantity 
to describe such motions in the native state is to look at the time correlation function 



$ i( r ) = J^U 8 ^) • s i (*)}{ s i(* + T >^ + r )»> ( 10 ) 



where the average is now a time average. For r — > oo, $1(00) is the counterpart of the 
Edwards-Anderson order parameter for spin glasses. The fold parameter $ comes from 
Eq. [To] if the limits are taken in the reverse order, i.e., liniTv^oo lim^oo. It is known that 
unlike <3>, $1(00), is not a self-averaging quantity. 

The importance of self-averaging quantities in the protein context is that for such a 
quantity any typical sample behaves like the average one. In contrast, large sample to 
sample fluctuations are expected in non-self-averaging quantities. For biological activity, 
mutants behave differently, mainly because FIMs get modified. It is, therefore, gratifying 
to find that the measure $1 introduced above has the non-self-averaging property that can 
distinguish a mutant or denatured protein from the native one. 

In summary, we have shown (in the spirit of lattice gas models of liquid gas transition) 
that for a correlated heteropolymer, the bonds variable are the suitable variables, and in 
terms of these, the phase transition in the protein can be described by a one dimensional 
vector spin glass model with long range interaction. We identified a fold parameter that 
should be the measure for the folding problem. Its exact scaling behavior under certain 
circumstances has also been determined. We find exactly that for certain types of correla- 
tions, the geometric exponents are determined completely by a of Eq. |3|. The correlations 
along the backbone can also destroy the scaling property, if a is large enough. In other 
words, unlike the uncorrelated cases of Ref [7|,§), our observations show that proteins need 
not have a generic scaling behavior, and correlations do play a major role in it. We suggest 
that elastic moduli in oscillatory stretching fields would help in the identification of the spin 
glass type transition in proteins, if there is one at all. Moreover, the proteins are inevitably 
of finite lengths, and therefore what one can observe is not a true transition but the finite 
size scaling behavior of the spin glass transition. This in turn opens up the new possibility 
of enriching our understanding of spin glasses via controlled experiments done on proteins 
with easily accessible T c . 
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FIGURES 




FIG. 1. a) A segment of the Kuhn chain, and b) its one dimensional spin representation. 
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